class: center, middle, inverse, title-slide # Introduction to Survey Data Cleaning Using Tidyverse in R ## Introduction ### Johannes Breuer
Stefan Jünger ### 2021-07-22 --- layout: true <div class="my-footer"> <div style="float: left;"><span>Johannes Breuer, Stefan Jünger</span></div> <div style="float: right;"><span>ESRA 2021, 2021-07-22</span></div> <div style="text-align: center;"><span>Introduction</span></div> </div> --- ## About us ### Johannes Breuer .small[ - Senior researcher in the team Data Augmentation, Department Survey Data Curation, [*GESIS - Leibniz Institute for the Social Sciences*](https://www.gesis.org/en/home), Cologne, Germany - (Co-)Leader of the team Research Data & Methods at the [*Center for Advanced Internet Studies*](https://www.cais.nrw/en/center-for-advanced-internet-studies-cais-en/) (CAIS), Bochum, Germany - Main areas: - digital trace data for social science research - data linking (surveys + digital trace data) - Ph.D. in Psychology, University of Cologne - Previously worked in several research projects investigating the use and effects of digital media (Cologne, Hohenheim, Münster, Tübingen) - Other research interests - Computational methods - Data management - Open science [johannes.breuer@gesis.org](mailto:johannes.breuer@gesis.org) | [@MattEagle09](https://twitter.com/MattEagle09) | [personal website](https://www.johannesbreuer.com/) ] --- ## About us ### Stefan Jünger .pull-left[ <img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\tidyverse-workshop-esra-2021\content\img\stefan.png" width="50%" style="display: block; margin: auto;" /> ] .pull-right[ - Postdoctoral researcher in the team Data Augmentation at the GESIS department Survey Data Curation - Ph.D. in social sciences, University of Cologne ] - Research interests: - quantitative methods & Geographic Information Systems (GIS) - social inequalities & attitudes towards minorities - data management & data privacy - reproducible research .small[ [stefan.juenger@gesis.org](mailto:stefan.juenger@gesis.org) | [@StefanJuenger](https://twitter.com/StefanJuenger) | [https://stefanjuenger.github.io](https://stefanjuenger.github.io) ] --- ## About you Please use the text chat to introduce yourself: - What's your name? - Where do you work? - What do you work on? - What are your experiences with `R` and the `tidyverse`? - What are your motivations for joining this course? What are your expectations for this course? --- ## Prerequisites for this course .large[ - Working versions of `R` and *RStudio* - Some basic knowledge of `R` - The `tidyverse` packages ] --- ## Workshop Structure & Materials - The workshop consists of a combination of short lectures and hands-on exercises - Slides and other materials are available at .center[`https://github.com/jobreu/tidyverse-workshop-esra-2021`] --- ## Course schedule <table> <thead> <tr> <th style="text-align:center;"> When? </th> <th style="text-align:center;"> What? </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 13:00 - 13:20 </td> <td style="text-align:center;"> Introduction: Welcome to the tidyverse </td> </tr> <tr> <td style="text-align:center;"> 13:20 - 13:30 </td> <td style="text-align:center;"> Exercise 1 </td> </tr> <tr> <td style="text-align:center;"> 13:30 - 13:45 </td> <td style="text-align:center;"> Data Import </td> </tr> <tr> <td style="text-align:center;"> 13:45 - 14:00 </td> <td style="text-align:center;"> Exercise 2 </td> </tr> <tr> <td style="text-align:center;"> 14:00 - 14:30 </td> <td style="text-align:center;"> Data Wrangling - Part 1 </td> </tr> <tr> <td style="text-align:center;"> 14:30 - 14:45 </td> <td style="text-align:center;"> Exercise 3 </td> </tr> <tr> <td style="text-align:center;"> 14:45 - 15:00 </td> <td style="text-align:center;"> <i>Coffee break</i> </td> </tr> <tr> <td style="text-align:center;"> 15:00 - 15:30 </td> <td style="text-align:center;"> Data Wrangling - Part 2 </td> </tr> <tr> <td style="text-align:center;"> 15:30 - 15:45 </td> <td style="text-align:center;"> Exercise 4 </td> </tr> <tr> <td style="text-align:center;"> 15:45 - 16:00 </td> <td style="text-align:center;"> Wrap-Up </td> </tr> </tbody> </table> --- ## Online format - If possible, we invite you to turn on your camera - If you have an immediate question during the lecture parts, please send it via text chat - Public or private (ideally to the person currently not presenting if you want an immediate response) - If you have a question that is not urgent and might be interesting for everybody, you can also use audio (& video) to ask it during the exercise parts - We would also kindly ask you to mute your microphones when you are not asking (or answering) a question --- ## What is the `tidyverse`? > The `tidyverse` is an .highlight[opinionated collection of R packages designed for data science]. All packages share an .highlight[underlying design philosophy, grammar, and data structures] ([Tidyverse website](https://www.tidyverse.org/)). > The `tidyverse` is a .highlight[coherent system of packages for data manipulation, exploration and visualization] that share a .highlight[common design philosophy] ([Rickert, 2017](https://rviews.rstudio.com/2017/06/08/what-is-the-tidyverse/)). <img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\tidyverse-workshop-esra-2021\content\img\hex-tidyverse.png" width="25%" style="display: block; margin: auto;" /> --- ## Benefits of the `tidyverse` .large[ Most of the things we are going to show you can also be achieved with base `R`. However, the syntax for this is typically (more) verbose and not intuitive and, hence, difficult to learn, remember, and read (plus many `tidyverse` operations are faster than their base `R` equivalents). ] --- ## Benefits of the `tidyverse` .large[ `Tidyverse` syntax is designed to increase **human-readability**. This makes it especially **attractive for `R` novices** as it can facilitate the experience of **self-efficacy** (see [Robinson, 2017](http://varianceexplained.org/r/teach-tidyverse/)). The `tidyverse` also aims for **consistency** (e.g., data frame as first argument and output) and uses **smarter defaults** (e.g., no partial matching of data frame and column names). ] --- ## `tidyverse` for `R` beginners <img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\tidyverse-workshop-esra-2021\content\img\DistractedBf.png" width="75%" style="display: block; margin: auto;" /> --- ## Workflow .center[ <img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\tidyverse-workshop-esra-2021\content\img\data-science.png" width="60%" style="display: block; margin: auto;" /> ] <small><small>Source: http://r4ds.had.co.nz/</small></small> .highlight[- **Import**: read in data in different formats (e.g., .csv, .xls, .sav, .dta) - **Tidy**: clean data (1 row = 1 case, 1 column = 1 variable), rename & recode variables, etc. - **Transform**: prepare data for analysis (e.g., by aggregating and/or filtering)] - **Visualize**: explore/analyze data through informative plots - **Model**: analyze the data by creating models (e.g, linear regression model) - **Communicate**: present the results (to others) --- ## `Tidyverse` workflow .center[ <img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\tidyverse-workshop-esra-2021\content\img\tidyverse-1200x484.png" width="1600" style="display: block; margin: auto;" /> ] <small><small>Source: http://www.storybench.org/getting-started-with-tidyverse-in-r/</small></small> --- ## Lift-off into the `tidyverse` 🚀 **Install all `tidyverse` packages** (for the full list of `tidyverse` packages see [https://www.tidyverse.org/packages/](https://www.tidyverse.org/packages/)) ```r install.packages("tidyverse") ``` **Load core `tidyverse` packages** (NB: To save time and reduce namespace conflicts it can make sense to load the `tidyverse` packages individually) ```r library("tidyverse") ``` --- ## `tidyverse` vocab 101 We will focus on three key things here: 1. Tidy data 2. Tibbles 3. Pipes --- ## Tidy data The 3 rules of tidy data: 1. Each **variable** is in a separate **column**. 2. Each **observation** is in a separate **row**. 3. Each **value** is in a separate **cell**. <img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\tidyverse-workshop-esra-2021\content\img\tidy_data.png" width="2560" style="display: block; margin: auto;" /> Source: https://r4ds.had.co.nz/tidy-data.html *NB*: In the `tidyverse` terminology 'tidy data' usually also means data in long format (where applicable). --- ## Wide vs. long format <img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\tidyverse-workshop-esra-2021\content\img\wide-long.png" width="90%" style="display: block; margin: auto;" /> Source: https://github.com/gadenbuie/tidyexplain#tidy-data --- ## Tibbles .pull-left[ Tibbles are basically just `R data.frames` but nicer. - only the first ten observations are printed - output is tidier! - you get some additional metadata about rows and columns that you would normally only get when using `dim()` and other functions You can check the [tibble vignette](https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html) for technical details. ] .pull-right[ <img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\tidyverse-workshop-esra-2021\content\img\tibble.png" width="60%" style="display: block; margin: auto;" /> ] --- ## A `data.frame` .tiny[ ``` ## cohort sex age_cat education_cat intention_to_vote choice_of_party political_orientation marstat household hzcy001a hzcy002a ## 1 2 1 10 3 2 98 7 1 2 4 6 ## 2 1 2 2 3 2 5 3 2 2 4 6 ## 3 1 1 8 1 2 2 7 1 2 2 2 ## 4 2 2 1 3 2 98 1 2 3 NA NA ## 5 3 2 7 3 2 5 2 2 2 6 6 ## 6 2 2 7 2 2 1 2 1 2 4 4 ## 7 1 2 7 3 2 5 3 1 1 4 4 ## 8 2 1 7 3 NA NA 3 1 3 NA NA ## 9 2 2 8 3 2 98 3 1 2 NA NA ## hzcy003a hzcy004a hzcy005a hzcy006a hzcy007a hzcy008a hzcy009a hzcy010a hzcy011a hzcy012a hzcy013a hzcy014a hzcy015a hzcy016a ## 1 3 6 4 1 1 0 0 0 1 0 1 1 0 0 ## 2 6 6 4 1 1 0 0 1 1 1 0 1 0 0 ## 3 2 2 2 1 1 1 0 0 1 1 0 1 0 0 ## 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 5 4 6 6 1 0 0 0 0 1 1 0 1 0 0 ## 6 3 4 4 1 1 0 0 0 1 1 0 1 0 0 ## 7 3 4 4 1 1 1 0 0 1 1 0 1 0 0 ## 8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## hzcy018a hzcy019a hzcy020a hzcy021a hzcy022a hzcy023a hzcy024a hzcy025a hzcy026a hzcy027a hzcy028a hzcy029a hzcy030a hzcy031a ## 1 0 4 4 4 4 4 2 2 1 4 2 3 3 3 ## 2 0 5 5 5 5 5 5 5 1 5 2 5 5 5 ## 3 0 5 5 5 5 5 5 5 1 5 3 5 5 5 ## 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 5 0 4 4 4 4 4 4 4 1 3 2 4 4 4 ## 6 0 NA 4 4 4 5 2 2 1 4 3 4 4 4 ## 7 0 5 5 5 5 5 5 5 1 5 4 5 5 5 ## 8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## hzcy032a hzcy033a hzcy034a hzcy035a hzcy036a hzcy037a hzcy038a hzcy039a hzcy040a hzcy041a hzcy042a hzcy043a hzcy044a hzcy045a ## 1 3 NA NA NA NA NA NA NA 2 5 2 2 2 4 ## 2 5 NA NA NA NA NA NA NA 2 3 2 3 4 5 ## 3 5 NA NA NA NA NA NA NA 2 2 2 2 4 4 ## 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 5 4 NA NA NA NA NA NA NA 3 3 3 3 5 4 ## 6 4 NA NA NA NA NA NA NA 3 3 2 3 4 4 ## 7 5 NA NA NA NA NA NA NA 2 2 1 3 4 3 ## 8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## hzcy046a hzcy047a hzcy048a hzcy049a hzcy050a hzcy051a hzcy052a hzcy053a hzcy054a hzcy055a hzcy056a hzcy057a hzcy058a hzcy059a ## 1 4 5 5 5 5 5 5 5 NA NA NA NA NA NA ## 2 5 5 5 5 5 5 5 1 0 0 0 0 0 0 ## 3 4 4 4 4 4 4 4 1 0 0 0 0 0 0 ## 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 5 4 4 4 4 4 4 5 1 0 0 0 0 0 0 ## 6 4 5 4 4 4 4 98 1 0 0 0 0 0 0 ## 7 3 4 4 4 4 4 4 1 0 0 1 0 0 0 ## 8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## hzcy060a hzcy061a hzcy062a hzcy063a hzcy064a hzcy065a hzcy066a hzcy067a hzcy068a hzcy069a hzcy070a hzcy071a hzcy072a hzcy073a ## 1 NA NA NA NA NA NA NA NA NA NA NA 2 NA NA ## 2 1 NA NA NA NA NA NA NA NA NA NA 2 NA NA ## 3 1 NA NA NA NA NA NA NA NA NA NA 2 NA NA ## 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 5 1 NA NA NA NA NA NA NA NA NA NA 2 NA NA ## 6 1 NA NA NA NA NA NA NA NA NA NA 2 NA NA ## 7 0 NA NA NA NA NA NA NA NA NA NA 2 NA NA ## 8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## hzcy074a hzcy075a hzcy076a hzcy077a hzcy078a hzcy079a hzcy080a hzcy081a hzcy083a hzcy084a hzcy085a hzcy086a hzcy087a hzcy088a ## 1 NA NA NA NA NA NA NA NA NA 1 1 0 1 1 ## 2 NA NA NA NA NA NA NA NA NA 1 0 1 0 0 ## 3 NA NA NA NA NA NA NA NA NA 1 1 0 0 0 ## 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 5 NA NA NA NA NA NA NA NA NA 1 0 1 0 0 ## 6 NA NA NA NA NA NA NA NA NA 1 1 0 1 0 ## 7 NA NA NA NA NA NA NA NA NA 1 0 1 0 0 ## 8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA ## hzcy089a hzcy090a hzcy091a hzcy092a hzcy093a hzcy095a hzcy096a hzcy097a hzcy098a hzcy099a hzza003a hzzq009a hzzq023a hzzp201a ## 1 0 0 0 1 0 0 NA NA NA NA 1 5 5 31 ## 2 0 0 0 0 0 0 NA NA NA NA 1 5 5 31 ## 3 1 0 0 0 0 0 NA NA NA NA 1 4 4 31 ## 4 NA NA NA NA NA NA NA NA NA NA 0 NA NA NA ## 5 0 0 0 1 0 0 NA NA NA NA 1 4 5 31 ## 6 1 0 1 0 0 0 NA NA NA NA 1 4 4 31 ## 7 0 1 0 1 0 0 1 1 0 1 1 4 4 31 ## 8 NA NA NA NA NA NA NA NA NA NA 0 NA NA NA ## 9 NA NA NA NA NA NA NA NA NA NA 0 NA NA NA ## hzzp204a hzzp207a ## 1 651 1585223562 ## 2 336 1584510380 ## 3 405 1585348329 ## 4 NA NA ## 5 411 1584468409 ## 6 443 1584968090 ## 7 412 1585408051 ## 8 NA NA ## 9 NA NA ## [ reached 'max' / getOption("max.print") -- omitted 3756 rows ] ``` ] --- ## A `tibble` .tiny[ ``` ## # A tibble: 3,765 x 111 ## cohort sex age_cat education_cat intention_to_vote choice_of_party political_orient~ marstat household hzcy001a hzcy002a hzcy003a ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 2 1 10 3 2 98 7 1 2 4 6 3 ## 2 1 2 2 3 2 5 3 2 2 4 6 6 ## 3 1 1 8 1 2 2 7 1 2 2 2 2 ## 4 2 2 1 3 2 98 1 2 3 NA NA NA ## 5 3 2 7 3 2 5 2 2 2 6 6 4 ## 6 2 2 7 2 2 1 2 1 2 4 4 3 ## 7 1 2 7 3 2 5 3 1 1 4 4 3 ## 8 2 1 7 3 NA NA 3 1 3 NA NA NA ## 9 2 2 8 3 2 98 3 1 2 NA NA NA ## 10 2 2 6 2 NA 98 5 1 2 4 6 3 ## # ... with 3,755 more rows, and 99 more variables: hzcy004a <dbl>, hzcy005a <dbl>, hzcy006a <dbl>, hzcy007a <dbl>, hzcy008a <dbl>, ## # hzcy009a <dbl>, hzcy010a <dbl>, hzcy011a <dbl>, hzcy012a <dbl>, hzcy013a <dbl>, hzcy014a <dbl>, hzcy015a <dbl>, hzcy016a <dbl>, ## # hzcy018a <dbl>, hzcy019a <dbl>, hzcy020a <dbl>, hzcy021a <dbl>, hzcy022a <dbl>, hzcy023a <dbl>, hzcy024a <dbl>, hzcy025a <dbl>, ## # hzcy026a <dbl>, hzcy027a <dbl>, hzcy028a <dbl>, hzcy029a <dbl>, hzcy030a <dbl>, hzcy031a <dbl>, hzcy032a <dbl>, hzcy033a <dbl>, ## # hzcy034a <dbl>, hzcy035a <dbl>, hzcy036a <dbl>, hzcy037a <dbl>, hzcy038a <dbl>, hzcy039a <dbl>, hzcy040a <dbl>, hzcy041a <dbl>, ## # hzcy042a <dbl>, hzcy043a <dbl>, hzcy044a <dbl>, hzcy045a <dbl>, hzcy046a <dbl>, hzcy047a <dbl>, hzcy048a <dbl>, hzcy049a <dbl>, ## # hzcy050a <dbl>, hzcy051a <dbl>, hzcy052a <dbl>, hzcy053a <dbl>, hzcy054a <dbl>, hzcy055a <dbl>, hzcy056a <dbl>, hzcy057a <dbl>, ## # hzcy058a <dbl>, hzcy059a <dbl>, hzcy060a <dbl>, hzcy061a <dbl>, hzcy062a <dbl>, hzcy063a <dbl>, hzcy064a <dbl>, hzcy065a <dbl>, ## # hzcy066a <dbl>, hzcy067a <dbl>, hzcy068a <dbl>, hzcy069a <dbl>, hzcy070a <dbl>, hzcy071a <dbl>, hzcy072a <dbl>, hzcy073a <dbl>, ## # hzcy074a <dbl>, hzcy075a <dbl>, hzcy076a <dbl>, hzcy077a <dbl>, hzcy078a <dbl>, hzcy079a <dbl>, hzcy080a <dbl>, hzcy081a <dbl>, ## # hzcy083a <dbl>, hzcy084a <dbl>, hzcy085a <dbl>, hzcy086a <dbl>, hzcy087a <dbl>, hzcy088a <dbl>, hzcy089a <dbl>, hzcy090a <dbl>, ## # hzcy091a <dbl>, hzcy092a <dbl>, hzcy093a <dbl>, hzcy095a <dbl>, hzcy096a <dbl>, hzcy097a <dbl>, hzcy098a <dbl>, hzcy099a <dbl>, ## # hzza003a <dbl>, hzzq009a <dbl>, hzzq023a <dbl>, hzzp201a <dbl>, hzzp204a <dbl>, hzzp207a <dbl> ``` ] --- ## Converting dataframes into tibbles You can convert any `data.frame` into a `tibble`: ```r gpc <- as.data.frame(gpc) tibble::as_tibble(gpc) ``` .tiny[ ``` ## # A tibble: 3,765 x 111 ## cohort sex age_cat education_cat intention_to_vote choice_of_party political_orient~ marstat household hzcy001a hzcy002a hzcy003a ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 2 1 10 3 2 98 7 1 2 4 6 3 ## 2 1 2 2 3 2 5 3 2 2 4 6 6 ## 3 1 1 8 1 2 2 7 1 2 2 2 2 ## 4 2 2 1 3 2 98 1 2 3 NA NA NA ## 5 3 2 7 3 2 5 2 2 2 6 6 4 ## 6 2 2 7 2 2 1 2 1 2 4 4 3 ## 7 1 2 7 3 2 5 3 1 1 4 4 3 ## 8 2 1 7 3 NA NA 3 1 3 NA NA NA ## 9 2 2 8 3 2 98 3 1 2 NA NA NA ## 10 2 2 6 2 NA 98 5 1 2 4 6 3 ## # ... with 3,755 more rows, and 99 more variables: hzcy004a <dbl>, hzcy005a <dbl>, hzcy006a <dbl>, hzcy007a <dbl>, hzcy008a <dbl>, ## # hzcy009a <dbl>, hzcy010a <dbl>, hzcy011a <dbl>, hzcy012a <dbl>, hzcy013a <dbl>, hzcy014a <dbl>, hzcy015a <dbl>, hzcy016a <dbl>, ## # hzcy018a <dbl>, hzcy019a <dbl>, hzcy020a <dbl>, hzcy021a <dbl>, hzcy022a <dbl>, hzcy023a <dbl>, hzcy024a <dbl>, hzcy025a <dbl>, ## # hzcy026a <dbl>, hzcy027a <dbl>, hzcy028a <dbl>, hzcy029a <dbl>, hzcy030a <dbl>, hzcy031a <dbl>, hzcy032a <dbl>, hzcy033a <dbl>, ## # hzcy034a <dbl>, hzcy035a <dbl>, hzcy036a <dbl>, hzcy037a <dbl>, hzcy038a <dbl>, hzcy039a <dbl>, hzcy040a <dbl>, hzcy041a <dbl>, ## # hzcy042a <dbl>, hzcy043a <dbl>, hzcy044a <dbl>, hzcy045a <dbl>, hzcy046a <dbl>, hzcy047a <dbl>, hzcy048a <dbl>, hzcy049a <dbl>, ## # hzcy050a <dbl>, hzcy051a <dbl>, hzcy052a <dbl>, hzcy053a <dbl>, hzcy054a <dbl>, hzcy055a <dbl>, hzcy056a <dbl>, hzcy057a <dbl>, ## # hzcy058a <dbl>, hzcy059a <dbl>, hzcy060a <dbl>, hzcy061a <dbl>, hzcy062a <dbl>, hzcy063a <dbl>, hzcy064a <dbl>, hzcy065a <dbl>, ## # hzcy066a <dbl>, hzcy067a <dbl>, hzcy068a <dbl>, hzcy069a <dbl>, hzcy070a <dbl>, hzcy071a <dbl>, hzcy072a <dbl>, hzcy073a <dbl>, ## # hzcy074a <dbl>, hzcy075a <dbl>, hzcy076a <dbl>, hzcy077a <dbl>, hzcy078a <dbl>, hzcy079a <dbl>, hzcy080a <dbl>, hzcy081a <dbl>, ## # hzcy083a <dbl>, hzcy084a <dbl>, hzcy085a <dbl>, hzcy086a <dbl>, hzcy087a <dbl>, hzcy088a <dbl>, hzcy089a <dbl>, hzcy090a <dbl>, ## # hzcy091a <dbl>, hzcy092a <dbl>, hzcy093a <dbl>, hzcy095a <dbl>, hzcy096a <dbl>, hzcy097a <dbl>, hzcy098a <dbl>, hzcy099a <dbl>, ## # hzza003a <dbl>, hzzq009a <dbl>, hzzq023a <dbl>, hzzp201a <dbl>, hzzp204a <dbl>, hzzp207a <dbl> ``` ] --- ## The logic of pipes Usually, in `R` we apply functions as follows: ```r f(x) ``` In the logic of pipes this function is written as: ```r x %>% f(.) ``` -- We can use pipes on more than one function: ```r x %>% f_1() %>% f_2() %>% f_3() ``` More details: https://r4ds.had.co.nz/pipes.html --- ## Pipes everywhere... ```r library(memer) meme_get("OprahGiveaway") %>% meme_text_bottom("EVERYONE GETS A %>%!!!", size = 36) ``` <img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\tidyverse-workshop-esra-2021\content\img\OprahGiveaway.png" width="60%" style="display: block; margin: auto;" /> --- ## Resources There are hundreds of tutorials, courses, blog posts, etc. about the `tidyverse` available online. The book [*R for Data Science*](https://r4ds.had.co.nz/) by [Hadley Wickham](http://hadley.nz/) and [Garrett Grolemund](https://twitter.com/statgarrett) (which is available for free online) provides a very comprehensive introduction to the `tidyverse`. The weekly [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday) data projects and the associated [#tidytuesday Twitter hashtag](https://twitter.com/hashtag/tidytuesday?lang=en) are also a fun way of learning and practicing data wrangling and exploration with the `tidyverse`. --- ## Cheatsheets *RStudio* offers a good collection of [cheatsheets for R](https://www.rstudio.com/resources/cheatsheets/). The following two are of particular interest for this workshop: - [Data Import Cheatsheet](https://github.com/rstudio/cheatsheets/raw/master/data-import.pdf) - [Data Transformation Cheatsheet](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf) --- class: center, middle # [Exercise](https://jobreu.github.io/tidyverse-workshop-esra-2021/exercises/Exercise_1.html) time 🏋️♀️💪🏃🚴 ## [Solutions](https://jobreu.github.io/tidyverse-workshop-esra-2021/solutions/Exercise_1.html)